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Identifiers- Project TALENT 

Three postcdoctoral fellows completed a 38-week training program designed to 
familiarize scientists dready experienced m educational research with the techniques 
of designing and executing a large-scale, long-range educational research project 
The progra.m was conducted by the research staff of Project TALENT, a project of 
the Institute for Research in Education of the American Institutes for Research. 
Trainees participated in a series of 4 seminars: Project TALENT Seminar, Computer 
Applications to Educational Research, Statistical Analysis, and Research Methodology 
Applicable to Large-Scale Educational Research. In addition, each trainee conducted 
an individual research effort. Among the factors contributing to the success of the 
program were the interaction between participants and the research community in 
general, the individualization of the program, and the computer facilities available for 
trainee use. All 3 postdoctoral fellows have received faculty appointments at 
institutions of higher education and thus will have an opportunity to contribute to the 
training of other research workers. Appended are abstracts of the research 
accomplished by 2 of the trainees and 3 joint papers produced by 2 of them. (JS) 
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INTRODUCTION 



This r 0 port discuss 0 s the op6r£ition of sn scsdornic y£a.r (38 
weeks) postdoctoral training program. The program was initiated 
September 1, 1966 and terminated May 31, 1967. Three postdoctoral 
fellows were selected for and completed the program. The objectives 
of the program were to familiarize scientists already experienced 
in educational research with the techniques of designing and executing 
a large-scale, long-range educational research project. The specific 
competencies developed by the program were as follows: 

1, an understanding of computer techniques and capabilities; 

2, statistical procedures applicable to large-scale educational 
research; and 

3, research strategy appropriate with large data files. 

The training program was conducted in the research setting of Project 
TALENT, a project of the Institute for Research in Education of the 

i 

American Institutes for Research, 

Description of the Program 

The program was conducted by the research staff of Project TALENT, 
The training program consisted of a series of four seminars, in which 
each trainee participated. In addition, an individual research effort 
has been, or is in the process of being, completed by each of the 
three postdoctoral fellows. This project used a portion of the data 
collected in conjunction with Project TALENT, 
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The four seminars are described below. ‘ 

1. Project TALENT Research Seminar, Chairman: Marion F. Shaycoft. 
This seminar included 1) background information about 
Project TALENT; 2) discussion of the sampling procedure 

and the sample; 3) the tests, inventories, and questionnaires 
used in conjunction with Project TALENT; 4) discussion of 
findings from past research, using Project TALENT data, 
and a discussion of current research. In the area of past 
and present research, findings concerning the Am<irican high 
school student and the American high school were presented; 
also findings based on the follow-ups one year aid five years 
after graduation from high school. Problems in psyche etric 
theory were discussed with special reference to :he manner 
in which they impinged on Project TALENT research and the 
solutions that have been applied. 

2. Seminar on Computer Applications to Educational Research, 

I 

Chairman: Paul R. Lohnes. 

This seminar included the following topics: 1) programming 

considerations involved in generating correlation matrices, 
inverting symmetric matrices, and finding their eigenvalues and 
eigenvectors. Each of the three participants became conversant 
with the computer language FORTRAN, the problems involved, 
and waat the operating system does in compiling and excuting 
a program; 2) the details of a large-scale computer installation 
with associated features; and 3) technical considerations in 
organizing, maintaining, updating, and effectively using a large- 
scale data file. 
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3. Seminar on Statistical Analysis, Chairman: Charles E. Hall. 

The following topics v/ere presented to the three trainees in 
approximately the given order: 1) correlational analysis; 

2) principal components analysis, principal factor analysis, 
mechanized rotational procedures; 3) multiple and canonical 
correlation; 4) central limits theorem and the variance ratio; 

5) student’s t-test and simple factorial univariate analysis 
of variance; 6) the general linear hypothesis model, and 7) 
multivariate analysis of variance with discriminant analysis 
as a subtopic. 

4. Seminar on Research Methodology Applicable to Large-Scale 

Educational Research, Chairman: William W. Cooley. 

In this seminar Project TALENT scientists and the postdoctoral 
trainees discussed the methodological considerations involved 
in ongoing research. Successive sessions of this seminar were 
devoted to presentation of research being conducted by various 
'members of the Project TALENT staff. In addition, each of the 
postdoctoral participants presented plans and progress regarding 
their own research with the Project TALENT data. 

Evaluation of the Program 

In general, all aspects of the training program were undertaken 
and accomplished as originally planned. The objectives were found 
to be quite realistic for a nine-month training effort. The fact 
that there were two times as many research staff members directly 
invo3.ved in the training program as there were trainees participating 
resu3 ted in both a comprehensive and an individualized program of 



instruction. At the time the progrnni Wcis proposed, it wns realized 
that such an undertaking, no matter how ambitious, could not hope to 
fulfill the need of the educational community for persons skilled in 
the computer and multivariate applications to educational research. 
For this reason, one of the selection factors was potential for 
contribution to the training of other research workers. Each of the 
three post-doctoral fellows selected for participaticn in the program 
has received faculty appointments at institutions of higher education 
and thus will have an outstanding opportunity to con:ribute to the 
training of other research workers. 

Several features of the program deserve special mention. First, 
is the support provided by the TALENT staff with regard to the indi- 
vidual research undertaken by each of the three participants. In 
an effort to facilitate this research, the services of the Editorial 
Assistant, the Research Assistants, Computer Programmers, and many 
others were made available to the postdoctoral fellows. Another 

t 

feature worthy of mention is the facilities that were made available 
to the fellows. Each was provided with virtually unlimited access 

to the several computers regularly utilized by the staff of Project 
TALENT. 

An unanticipated, but nevertheless welcome, feature was the 
opportunity for the three postdoctoral fellows to interact with 
the research community in general. An example of this V7as the op- 
portunity the three postdoctoral fellows had to spend an afternoon 
with Bert Green, Chairman of the Department of Psychology, Carnegie- 
Mellon University, In that afternoon, they were briefed on the 



advanced v;ork underway at the Carnegie-Mellon University with regard 
to the application of computers to behavioral research. In addition, 
each of the three fellows was provided opportunities to interact 
with the faculty of many departments of the University of Pittsburgh. 

Among the departments making faculty members available for discussion 
with the postdoctoral fellows were the Department of Educational 
Research, the Computer Center, the Department of Sociology, the 
Political Science Department, the Knowledge AvailabiD.ity Center, the 
Learning and Research Development Corporation (a R&D Center established 
by the OE and directed by Robert Glaser), and the Business School. 
Interaction with members of the research coimnunity provided a special 
opportunity for the postdoctoral fellows to put into perspective 
the individual research they undertook. 

Still another feature worthy of mention was the individualiza- 
tion of the program. Aside from the four ongoing seminars, each 
of the three postdoctoral fellows had ample opportunity to work 
with ;those research staff members with interests similar to theirs, ' 
or capabilities uniquely associated to their individual research. 

The fact that there were six research staff members and three postdoctoral 
fellov7S enabled the instruction to be done at a much more individual and 
personal level than would have otherwise been possible. One last 
strength of the program deserving mention is the quality of the 
three postdoctoral participants. Whereas the late announcement of 
the initial awards handicapped other programs in selecting students, 
it Weis not an especially potent factor in effecting the quality of 
this program. An iimiediate and hardhitting publicity campaign fol- 
. lowing the announcement of support for the program produced widespread 

er|c 
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interest and numerous applications for participation. As a result, 
it was possible to select from the applicants the three candidates 
who best met the criteria established in the proposal: 1) unusual 

career achievements; 2) the ability to benefit from the proposed 
training program; and 3) interest in, and potentials for outstanding 
contributions, to educational research and to the training of other 
research workers. 

The major difficulty encountered in the program was the speed 
with which seminar and research activity had. to proceed to provide 
indepth coverage of the material presented. Ideally, the program 
would have been of slightly longer duration to provide the opportunity 
for the postdoctoral fellows to more thoroughly assimilate the topics 
covered . 

The overall evaluation of the program is highly favorable. Ob- 
jective evidence to support this evaluation is from three sources. 
First are the products of the three postdoctoral fellows. Attachment 

I 

A of this report includes an abstract of the research accomplished 
by two of the postdoctoral trainees. Attachment B includes three 
joint papers produced by two of the postdoctoral fellows as a direct 
result of their participation in the program. A second source of 
objective evidence is the positions obtained by the three fellows 
upon completion of their postdoctoral training. As mentioned, all 
three have joined the faculty of institutions of higher education 
and, thus, v;ill have many and continued opportunities to embellish 
and disseminate the experiences garnered during the course of their 
postdoctoral education. The third source of evidence is the opinions 
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of the three postdoctoral fellows. Each was provided with several 
opportunities to evaluate the progress of the program during the 
course of the nine months. Suggestions made for improvements were 
incorporated into the program whenever possible. At the conclusion 
of the program the trainees were informally asked to give their 
opinions of the overall program. All three were quite positive in 
their evaluation of the experience gained in the cou::se of the post- 
doctoral program. The major criticism concerned the short duration 

of the program. 

The biggest disappointment on the part of those concerned with 
this program is the fact that it will not be permitted to continue. 

The original proposal outlined 4 one— year postdoctoial programs, 
the last three of vrhich would have built on the experience gained 
from the first. We feel that we have both put together a good 
program and acquired the experience necessary to expand it. Despite 
this, we have been assigned no postdoctoral fellows for the coming 

I 

academic year. 

■ It should also be mentioned that during the course of the past 
nine or ten months we have had serious inquiries regarding our program 
from approximately 30 persons. In addition to these persons, there v/ere 
many qualified applicants who, because of the short notice, were 
unable to apply for participation during the past year. In light of 
the success in both enrolling throe postdoctoral fellows and offering 
them a well-planned nine-month program, the current procedures in- 
corporated by the Research Trai.ning Branch of the U.S. Office of 



Education, make little sense. The necessity for curtailing the 
postdoctoral aspect of the research training program is understand- 
able. The reason that the. Project TALENT program will not be 



sllowed to contxnu.' xs hard to un^^erstan 



If ths OfficG of Educstion 



continue - 3 to select postdoctoral fellows by means of national competi- 
tion it is suggested that efi^rts be made to provide qualified institutions 
with a greater opportunity of acquiring fellows interested in being 
located at that institution. 

Program Reports 

1. Publicity 

In addition to the announ iment published in the AERA*s Educational 
Researcher, the announcement included as Attachment C was sent 
to approximately 1200 persons from the Project TALENT mailing list 
in late June, 1966. The 1200 persons included the Project TALENT 
regional coordinators, college professors, and other professionals 
who have, from time to time, indicated interest in Project TALENT. 

2. Application S ummary 

a. Approximate number of inquiries from prospective trainees: 15. 

b. Number of completed applications received: 8. 

c. Number of first-rank applications: 5. 

d. How many applicants were offered admission: 4. 

3. Trainee Summary 

a. Number of trainees initially accepted to the program: 3. 

Number of trainees enrolled at beginning of program: 3. 

Number of trainees v/ho completed program: 3. 
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b. Categorization of trainees 



Number of trainees who are principally 
elementary or secondary public school teachers: 0 

Number of trainees who are principally local 
public school administrators or supervisors: 0 

Number of trainees from colleges or univer- 
sities, junior colleges, research bureaus, e-:c.: 3 



4 . Program Director’s Attendance 

As described earlier, the program covered a nine-month period 
beginning September 1, 1966 and concluded May 31, 1967. The 
trainees were present continuously during this nine-month 
interval, except for the normal holiday and vacation schedule 
applicable to employees at the American Institutes for Research. 
The Director and all research staff of Project TA.LENT were 
present in accordance with outlined policy. 

5 . Financial Summary 



Budgeted 



Expended or Committed 



a. Trainee Support 



(1) Stipends 



$ 8,500/per trainee $25,500 



(2) Dependency Allowance 



0 



0 



(3) Travel (Relocation) 



500/per trainee 



1,500 



b. Direct Costs 

(Institutional Allowance) 3,000 



3,000 



c. Indirect Costs 



0 



0 



TOTAL 



$30,000 



$30,000 
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Attachment A 



Abstracts of Projects of Research Fellov/s 




Effect of Negro Density on Student Variables and the Post-High 
School Adjustment of Male Negroes 

David E. Kapel 

The major concern of this study was to evaluate the effects 
of Negro density, community, and regional differences on post-high 
school adjustment and student factors for Negro males. Tliree 
specific null hypotheses were tested. Tv;o were rejected as a result 
of analyses that found: (1) environmental-parametei- groups could 

be distinguished from each other; and (2) significant differences 
were generated by regional influences, but not by community and 
Negro density factors. The third null hypothesis was not rejected 
as a result of the analyses that found no significant environmental 
factors influencing types of post-high-school education acquired and 
projected . 

The rejection of the first two hypotheses might have been a 
function of the mediating influence of environmental factors on 
student and employment variables, vis-a-vis social status, amounts 
spent on education, quality of education, and occupational oppor- 
tunities across environmental levels; while the nonrejection of 
the third hypothesis indicated that environmental factors did not 
significantly influence the educational goals that were studied. 

It is also .apparent that certain variables provided better dis- 
criminatory power than others, and that a multivariate approach 
gives a clear picture of the important and significant variables 




that need to be studied. 



Role Expectancies for American Adolescents 
William A. Love, Jr. 



This study deals with the relationship between personality 
abilities, sex and sociometric standing. The researcher attempted 
to define role expectancies for American adolescents. Since socio- 
metric status may be taken as an index of the acceptance accorded 
an individual, then if personality and ability traits held by these 
persons are analyzed, those traits which are valued can be assessed. 
Since this study considered both same sex and cross sex choices, the 
researchers were able to get some idea of what was valued within 
sex and cross sex. 

The second aspect of the study was methodological. Techniques 
utilizing canonical correlation, which were developed by Douglas K. 
Stewart and this researcher were utilized in the analysis. Since 
these techniques are new, this study functioned as a try-out for their 



usefulness . 



Attachment B 



Joint Papers by Research Fellows 




ASSESSING THE RELATIVE IMPORTANCE OF VARIABLES IN THE CANONICAL SOLUTION 

William Love and Douglas Stewart 

Canonical correlation has proved worthwhile in various studies 
of behavioral data. Because a canonical correlation is the correla- 
tion between two linear composites, the correlation does not inform 
us of the relative importance of individual variables. T.ie inter- 
pretation of a given canonical correlation is greatly aided by 
following Meredith’s (1964) suggestion that the correlation between 
an observed variable and the canonical variate be computed (here- 
after referred to as a "canonical loading"). Consider t\/o sets 
of variables designated P and Q (for convenience the P set will be 
considered the predictor set and the Q set will be considered the 
criterion) . Given a variables by canonical variates matrix of 
squared loadings (L) , L_ represents the proportion of variance 

of the _ith variable associated with the j^th canonical variate. 

9 9 2 2 

Noting that r7 =r. ,-r., (where r,. ,=0) we' may multiply the squared 

^ ik ij jk ik*j 

canonical loading (L, .) by the squared canonical correlation (X ) 
in order to determine the proportion of variance of the i th variable 
of the Q set predicted from the jjih canonical composite of the P set. 
If for the ith variable we sum the proportions predicted from each 
of the canonical composites of the P set, we have the total proportion 

■ , ' ^ “ 

^The authors wish to express their appreciation to Paul R. Lohnes 
who encouraged and guided the present effort while they were Office 
of Education Post— Doctorai. Fellows (O.E.G. 1—6—062084) at Project 
TALENT . 



of variance in the _ith variable predicted by the canonical solution. 
Thus, if all canonical roots are extracted, this sum is the value of 



rnnlt-Tnlo trnv-tohci r\-p hVio 0 Qot" 

and all the variables of the P set. 

Where L is a matrix of squared canonical loadings of M vari- 
q ^ ^ q 

ables, X is a column vector of squared canonical correlations, and 



H is a column vector of squared multiple c rrelations :.n the case 

q 

of the full canonical solution (i.e., all canonical roo;s removed): 



H =L A 

q q 

The mean of the elements of can be interpreted as the pro- 
portion of variance in the Q set predicted from the P set (designated 
R) . It v/ill also be noted that the column sum of squared loadings for 
the _ith column when m.ultiplied by the j_th squared canonical correla- 
tion and divided by (the rank of the Q set) is interpretable as 
the proportion of variance in the Q set predicted by the j_th canonical 
root from the P set, and is therefore instructive in detemining xi/hich 
canonical roots bear interpretation. 

To demonstrate the techniques described above, the authors have 
reanalyzed data presented by Lohnes (1966), who factored two sets of 
measures which he termed: 1, Abilities (designated L) and 2. Motives 

(designated R) . 



The factors of the abilities domain are: 1. Verbal Knowledge; 

2. Perceptual , Speed and Accuracy; 3. Mathematics; 4. Hunting- 
Fishing; 5. English I.anguage; 6. Visual Reasoning; 7. Color, Foods; 
8. Etiquette; 9. Memory; 10, Screening; 11. Games. In the motives 




2 



domain: 1. Business Interests; 2. Conformity Needs; 3. Scholasticism 

4. Outdoors, Shop Interests; 5. Cultural Interests; 6. Activity 
Level; 7. Impulsion; 8. Science Interests; 9. Sociability; 10. Lead- 
ership; 11. Introspection. 

T.able 1 shows the canonical loadings and correlations for the 

two sets. Given that M =M where M is the rank of the sets, all 

L K 

variance is extracted from both sides. Table 2 presents the column 

vectors H and H which contain squared multiple correlations. The 
L R 

mean of the first column (R) is interpretable as the preportion of set 
variance predicted by the variables of the opposing set. Column 2 
presents each squared multiple correlation as a proportion of the sum 
of the first column and therefore can be interpreted as the proportion 
of R attributable to each variable. The proportion of left variance 
predicted by the right set of variables and the proportion of 

right variance predicted by the left set of variables are both 

approximately .10, indicating relative independence betv/een the two 
sets. The proportioned (column 2 of Table 2) for each variable 
is useful for describing the area of redundant variance. In the abili- 
ties (left) set, Verbal Knowledge (.270), Mathematics (.207), and 
English Language (.121) are the important variables. In the motives 
(right) set. Scholasticism (.241) and Science Interest (.152) are 
the major contributors. LTiile the overlap between the two systems is 
approximately I’O per cent, the area of overlap tends to be the result 
of the relationship between academic ability variables in the left 
set, and academic interest variables in the right set. 



3 
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The problem to which this paper has been addressed is the assess- 
ment of the relative importance of various variables in the canonical 
solution. We have suggested a' summary measure for determining the 
proportion o^ variance of one set predicted by another set (R) . The 
relative contributions of variables to the general index have there- 
fore been proposed as an indication of the relative importance of 
the variables to the canonical solution. It should be emphasized 

that R is the mean of squared multiple correlations only when all roots 

2 

are removed (which is to say H contains R s when all roots are con 

4 

sidered but is smaller if fewer than roots are considered). 
Researchers may on occasion wish to impose criteria as to which roots 
are used (such as significance levels) such that R is no longer the 
mean of squared multiple correlations. 
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Table 2 



Variable 



Left 




1 

2 

3 

k 



o 

7 

8 

9 

10 

11 




.098 



.293 

.067 

.224 

.072 

.131 

.073 

.016 

.043 

.011 

.076 

.078 



Right 



Variable 

t 


R 


1 . 


.068 


2 . 


00 
1 — 1 
1 — 1 

• 


3 


OJ 

CO 

CM 

• 


4 


.098 


5 


.114 


6 


.098 


7 


VO 

CO 

0 

• 


8 


.177 


‘9 


.084 


10 


.027 


11 


.018 


M 





R^. / ZR^ 

1 ' 

. 2,0 
.062 
.207 
.066 
.121 
.067 
.015 
■ .039 
. 010 ' 
.070 
.072 



R®. / 

1 

.058 

.101 

.241 

.084 

.097 

.084 

.073 

.152 

.072 

.023 

.015 
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A SIl/IPLE ALGORITm-I FOR COMPUTING MULTIPLE CORRELATIONS 
FROM l^IE CANONICAL SOLUT'ION^ 

William Love and Douglas Stevmrt 



Canonical correlations are used increasingly by behavioral 
researchers. Follov/ing Meredith (l9o4) many analysts choose to 
interpret the correlations between observed variables and canoni- 
cal variates (hereafter referred to as canonical loadings) rather 
than the weights which form the canonical variates. Given two sets 
of variables (designated p and q) the multiple correlations be- 
tween each element of one set and all elements of the opposing set 
can be simply computed. Given a matrix of squared canonical load- 
ings (L f where L is a variable by canonical variate matrix for 
P P 

the p set) and a column vector of squared canonical correlations (X)^ 




where R^ is a column vector of squared multiple correT.ations between 
each element of the p set and all elements of the q s€:t. Thus, in 
order to compute squared multiple correlations : 



1. Square each element of a canonical loading ms.trix (forming 




2. Multiply each element of the jth column of L by the square 

of the j^th canonical correlation (x .) ; 

, J 

■3« The sum of the elements- in the ith row is the squared 
■ * multiple correlation of the ^th variable of the p set 
with the variables of the q set. 



It has also been noted that the sum of the ^th column of this 
matrix when divided by M (the niunber of variables in the p set) can 
be interpreted as the proportion of variance in the p set accounted 
for by the ^th canonical root and is therefore instructive in de- 
termining which canonical roots bear interpretation (two linear com- 
posites may be well cbrrclated without representing significant 
portions of variance) . 



^This work was undertaken v^hile ttie authors vrere Office of Education 
Post Doctoral Fellows (O.E.G l-6-O62O8^0 at Project TALENT. 



A GENERAL CANONICAL CORRELATION INDEX 
Douglas Stewart and William Love 

Because a canonical correlation is the correlation between 
two linear composites, it presents some interpretive problems. 

No measure of the redundancy in one set of variables, given another 
set of variables, has been available. A nonsymmetric ind2x of 
redundancy is proposed which represents the amount of predicted 



variance in a set of variables. 
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A GEKER/.L CANONICAL CORREMTION INDEX^ 

The interpretation of canonical correlations presents some 
problems, I'Jliereas a squared multiple correlation represents the pro- 
portion of criterion variance predicted by the optimal linear com- 
bination of predictors, a squared canonical correlation represents 
the variance shared by linear composites of two sets of variables, and 
not the shared variance of the two sets. 

Unfortunately, therefore, canonical correlations cannot be 
interpreted as correlations between sets of variables. It Is important 
to note that a relatively strong canonical correlation may obtain be- 
tween two linear functions, even though these linear functions may not 
extract significant portions of variance from their respective batteries. 
This is the problem of interpretation to which this paper is addressed. 

Rozeboom (1965) has suggested the relevance of infermation 

theoretic concepts in dealing with canonical correlations. Uncertainty 

and alienation are considered parallel, and similarly, redundancy and 
«/ 

correlation are treated as analogous. Given this approach, Rozeboom 
develops a' general index which is similar to one presented by Anderson 
(1958 j p. 244). Both measures are symmetric, i.e., given two sets of 
variables, one number is presented which presents tVie magnitude of 
their intersection. A directional or non- symmetric index is possible 
by pursuing the information theoretic analogues suggesteo by Rozeboom. 

In addition to the primitive concept of uncertainty (or entropy) 

Shannon (Shannon and Weaver, 1949) discusses conditional uncertainty. 



^Tlie authors wish to express their appreciation to Paul R. Lohne 
who encouraged and guided the present effort while they were 0ffi.ee of 
Education Post-Doctoral Fcllovzs (O.E.G. 1-6-062084) at Project lALENi 



Similarly j one may discuss the complement of conditional uncertainty 
as conditional redundancy. A non -rymme trie measiire is considered de- 



sirsble because one set of variables may be alriiost compleLely sub- 
subsumed by a larger set; i.e., redundancy can be represented as the 
intersection of two sets of variables, and it is desirable to represent 
the proportion of one set which is in the intersection (see Fig. 1), 



INSERT FIGURE 1 



In the case pictured in Figure 1, it is clear that most of 
set A is contained in set B, whereas a relatively large por ;ion of set 
B is outside the intersection. This paper proposes an index based on 
canonical correlation which is non- symmetric and has been worthwhile 
in the analysis of various partitioned matrices. 

If we were to factor analyze two sets of variables independently 
and then develop weights V7hich would rotate the two factor structures 
to maximum correlation, we would have a canonical solution (Hotelling, 
1935). In the canonical case the factors are usually referred to as 
canonical variates. The correlation betx^/cen the first factor of the 
left sot and the first factor of the right set is the first canonical 
correlation ^R^ ^ In-order to take advantage of the well developed 
language of factor analysis, we shall call them canonical factors. 

Since the complete factor structure of a set of variables v/ill 
contain as many factors as there are variables,^ it is obvious that if 

This is only true where the rank of tlic matrix equals tlie 
order. In gcueial this is the case and v.'ill be assumed in this paper. 



3 



the larger set is composed of five variables and the smaller set of 
three variables, only three factors can be extracted from the smaller 
set. As a result, R 's are available between three of the factors of 
the larger set and the three factors of the smaller set. The remaining 
two factors in the larger set have no counterpart in the smaller set 
and do not enter into the canonical solution. 

In the traditional interpretation of canonical correlations, 
the magnitude of the v^ether or not they are significantly non- 

zero, and the weights used to obtain the R 's are considered (Cooley 

c 

and Lohnes, 1962). The interpretation of these weights has all the 

problems attendant to the beta weights of common multiple regression. 

At the suggestion of Meredith (1964), some investigators now compute 

the correlations between the variables in a set and the canonical 

factors of that set (the factor loadings of factor analytic parlance).^ 
Before we consider a method of calculating an index of re- 

dundancy v;e should agree on vocabulary. We need one index for the 

redundancy' in the left set given the right and another index for the 

reverse relation. For the sake of simplicity, we will consider one 

set of variables as the predictor or conditioning set and the other 

set as the criterion, as in multiple regression. We talk about the 

proportion of variance in the criterion accounted for by the predictors, 

but seldom if ever consider the reverse relationship. It is obvious 
that by reversing our definition of criterion and predictor we could 
develop the index going in the other direction. The canonical factors 



This proposal v/ill be utilized in the forthcoming second 
edition of Cooley and Lohnes. 



O 



4 



of the predictor set will be FP. and similarly FC, for the criterion 

1 .1 

set. The variables of the predictor and criterion sets v/ill be 
and C^, respectively. Since the index about to be proposed utilizes 
the concept of a factor extracting a proportion of the variance (more 
appropriately proportion of trace) of a set of variables (usually a 
battery of tests) , we will define the column sum of the squared 
loadings of variables within a set on a canonical factor of the set 
as the variance extracted by that factor. When this is divided by the 
number of variables in the set (M) , the resulting value is the pro- 
portion of the variance of the set extracted by that canonical factor. 
This will be symbolized as VP^ and VC^. The squared canonical cor- 
relationsfR ] will be written as A. (following Cooley and Lohnes, 

Vi) \ 

1962). This is the proportion of variance in one of the ith pair of 

canonical variates predictable from the other member of the pair. If 

the VC^ is multiplied by the the resulting figure is the proportion 

of the variance of the C set explained by correlation betwecm FP^ and 

FC.. If this value is calculated for each of the M pairs of canonical 
1 : c 

factors, the. result is an index of the proportion of variance of C 
predictable from P, or the redundancy in C given P. 



M 



M 



I = E X 



k=l 



k ''^k 



= E A. 

k=l ^ 



M 

c 

E 

j=i 






(where is the correlation between the jth 
variable and kth canonical factor.) 



M 
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We have called this index R (R bar) because it was noted that 
2 

it a mult R were computed betv7ecn the total P set and each variable 

# ' 

- 2 

of the C set, R =ZR /Me. In other words R is the mean squared multiple 

correlation. The possible range of R is from 0,0 to +1.^ 

An example of the use of canonical correlation is presented by 

2 

Lohnes and Marshall (1965). In this study three scores from the 

Pintner General Ability Test (PGAT) and ten from the Metropolitan Achiev- 

ment Test were entered into a canonical correlation with the 7th and 

8th year course grades in English, arithmetic, social studies, and 

science of 230 junior high school students in a small, rural college 

to\>nn. The first tV 70 canonical correlations were reported (li = .90 

^1 

and R = .66). The canonical weights were reported and intierpreted, 

^2 

In the present analysis of the Lohnes-Marsha 11 data., the weights 
were ignored and the factor loadings and R‘s were inspected. 



INSERT TABLE 1 



In the left set, loadings from .707 to .917 are found on the 
first factor. Tlie loadings on the second factor drop substantially. 

The same condition holds in the right set. In Table 2, columns 1 and 2 
present the canonical correlations and their squares. Note that the 



It should be noted that if Mc<Mp then R ^1.0. If R cal- 
culated for P and Mc< Mp then R <1.00. The only time R can equal 1.0 
is when each \ - 1.00 and the canonical factors of the sc;, in question 
extract 100 percent of the generalized variance in that set. 

2 

Professor Paul R. Lohnes graciously allov7cd us to use his 
data and modified his latest canonical program to calculate our index. 
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upper portion of Table 2 considers the left set as criterion and right 
set as the predictor set, while the lower portion reverses these roles. 
The third column of Table 2 presents the proportions of the variance 
of the set extracted by each canonical factor (variate). The fourth 
column is the amount of redundant variance attributed to each canonical 
factor. The fifth column expresses the values in the fourth column as 
proportions of the total redundancy. 

From this we see that: 

1. The eight canonical factors extract 90 percent of 
variance of the left set; 

2. Fifty-nine percent of the variance of the loft set 
is predicted by the variance in the right; set (i.e., 
R = .59); 

3. Of the redundant variance, 93 percent is associated 
with the first canonical variate; 

4. Despite the large value of R = .66, tha second 

^2 

t 

; canonical variates have very small amounts of var- 

iance associated (5 percent in both the left and 
right sets) ; 

5. The eight canonical factors of the right (and 
smaller) set extract 100 percent of the variance 

’ of that set (which is simply to assert that the 

smaller sot is completely factored in the canonical 
solution) ; 

6. The redundancy of the right sot (student grades) 
given the loft set is R == .61; and 
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7. Of the redundant variance af the right set, 92 
percent is associated with the first canonical 
variate. 

The utility of R is as a summary index. In general it is not 
to be viewed as an analytic tool. Certain associated indices, how- 
ever, have obvious analytic applications. For example, the proportion 
of redundant variance associated with a given factor is instructive 
in determining whether the factor deserves interpretation ard further 
attention (in the case noted above, a canonical correlation of .66 was 
associated with only .05 of the variance of either side, an<l only h 
percent of the redundant variance -- in short, this index instructs 
us differently than does the canonical correlation alone). 
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TABLE 1 



FACTOR 


STRUCTURE 


FOR LEFT 


SLT. 


COLUI-INS ARE 


CANONICAL 


FACTORS . 


ROWS 


ARE TEST 


1 


-.786 


.061 


-.082 


-.313 


.054 


.163 


-.251 


.026 


2 


-.828 


-.163 


.018 


-.191 


-.082 


.174 


-.276 


.031 


3 


-.707 


-.462 


.009 


-.444 


.066 


-.102 


.018 


-.152 


4 


-.800 


-.031 


.178 


-.095 


-.071 


.451 


-.026 


.050 


5 


-.817 


.061 


.169 


-.194 


.003 


.311 


-.136 


-.340 


6 


-.887 


.185 


-.096 


.074 


-.080 


.005 


-.081 


-.005 


7 


-.917 


.119 


-.055 


-.148 


.205 


-.016 


.120 


.050 


8 


-.836 


-.066 


.088 


-.245 


-.046 


.082 


-.001 


.210 


9 


-.903 


-.212 


-.086 


.099 


.083 


-.042 


.069 


-.182 


10 


-.839 


-.351 


.016 


-.006 


-.022 


.008 


.160 


-.136 


11 


-.752 


.048 


.581 


-.123 


.063 


.053 


-.105 


-.113 


12 


-.798 


-.360 


.136 


.011 


.065 


-.076 


-.243 


.096 


13 


-.726 


-.190 


.218 


-.126 


.447 


.321 


-.198 


-.023 



FACTOR STRUCTURE FOR 


RIGHT SET. 


COLUMNS 


ARE FACTORS. 


ROV7S ARE 


TESTS. 


1 -.847 


-.322 


-.065 


.094 


.212 


-.326 


-.033 


-.119 


2 -.795: 


-.446 


-.014 


-.067 


-.230 


.255 


.117 


-.182 


3 -.951, 


.140 


.011 


-.108 


.095 


.046 


-.099 


.206 


4 -.878. 


.241 


-.011 


.025 


-.194 


-.055 


-.057 


-.354 


5 -.901 


.127 


.315 


.227 


.080 


.073 


.093 


-.002 


6 -.743 


.001 


.540 


-.134 


-.189 


-.021 


-.180 


-.263 


7 -.800 


.027 


.088 


-.222 


.412 


-.111 


.195 


-.288 


8 -.727 


-.079 


.209 


.034 


.063 


.335 


-.361 


-.4]6 






TABLE 2. Components of Redundancy Measure 







L E 


FT S E T 








I 


II 


III 


IV 


V 


Factor 


Canonical R 


R“Squared 


Variance 

Extracted 


Redundancy 


Proportion of 
Total Redundancy 


J\ 

c 


X 


VC 


x-vc 




1 


.9021 


.814 


. 668 


.544 


.927 


2 


.6625 


.439 


.049 


.022 


.037 


3 


.5015 


.251 


.038 


.010 


.016 


A 


.3886 


.151 


.039 


.006 


.010 


5 


.3098 


.096 


.022 


.002 


.004 


6 


.2785 


.078 


.038 


.003 


.005 


7 


.1500 


.022 


.025 


.001 


.001 


8 


.0722 


.005 


.020 


.000 


.000 


Total 


Variance Extracted from Left 


Set = .899 






R, Total Redundancy for 


Left Set, 


Given Right Set 


= .586 









R 


I G H T SET 








I 


II 


III 


IV 


V 








Variance 




Proportion of 


Factor 


Canonical R 


R-Squared 


Extracti. \ 


Redundancy 


Total Redundancy 


'R 

c 


X 


VC 


X • VC 




1 


.9021 


.81A 


.695 


.566 


.923 


2 


.6625 


.439 


.050 


.022 


.036 


3 


.5015 


.251 


.056 


.014 


.023 


4 


.3886 


.151 


.018 


.003 


.004 


5 


.3093 


.096 


.0A5 


.004 


.007 


6 


.2785 


.078 


.038 


.003 


.005 


f 


.1500 


.022 


.030 


.001 


.001 


8 


.0722 


.005 


.068 


.000 


.001 


Tota] Variance Extracted from Right Set - l.QOO 






R, Total 


Redundancy for 


Left Set, 


Given Right Set 


= .613 
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Attachment C 



Announcemcnu of Training Fellowships 




AMERICAN INSTITUTES FOR RESEARCH 



Institute for Research in Education 
Project TALENT Training Fellowships 



Beginning September 1, 1966, Project 
TALENT is .offering a postdoctoral 
program for training in computer and 
multivariate applications to educational 
research. Participants will explore a 
particular area of research using Pro- 
ject T'ALENT data and participate in 
the JLolio^\dng seminars: 

(1) Project TALENT research 

(2) Computer applications to educa- 
tional research 

(3) Statistical analysis including mul- 
tivariate statistics 

( 4 ) Research methodology applicable 

to large-scale educational research 



Financial support from the Office of 
Education permits a stipend of $8,500 
and relocation costs, Final selection 
of fellows for academic year 1966-67 
will be made on May 31, 1966. 



Professional Staff includes: 

William W. Cooley, Project Director 
Marion F. Shaycoft, Associate Director 
Paul R. Lohnes, Director of Guidance 
Studies 

Charles E. Kali, Director of School 
Studies 

Bary G. Wingersky, Director of Com- 
puter Systems 

Lyle F. Schosnfeldt, Data Bank Co- 
ordinator 

This program is pr;.marily designed 
for those who are no'v holding, or who 
plan to hold, positions at colleges and 
universities which involve the training 
of educational resear ch workers. 



Interested individuals should contact: 
William W. Cooley 
Director of Project TALENT 
135 North Bellefield Avenue 
Pittsburgh, Pennsylvania 15213 



Project TALENT is a longitudinal study of American high schcol students which 
is investigating factors infiiioncing educational and vocational choices, in March 
1960 tests were given to 440,000 students in 1,353 secondary scliools. These stu- 
dent.s are being followed up one, five, ten, and tv/enty years following graduation 
from high school. 



